home *** CD-ROM | disk | FTP | other *** search
Text File | 1996-09-23 | 15.4 KB | 331 lines | [TEXT/ttxt] |
- Meta-Content Format
-
- R.V.Guha
- Apple Computer
-
- This draft provides a description of the Meta-Content Format (MCF), the
- interchange format used by ProjectX (aka HotSauce). This is a document that
- is intended to evolve rapidly based on the feedback from the Internet
- community and from our experiences with applications of MCF.
-
- Goals of the MCF
-
- The goal of MCF is to provide an adequate language for representing a wide
- range of information about content. The content targeted includes web pages,
- gopher and ftp files, desktop files, email and structured (i.e., relational
- and object oriented) databases, etc. The corresponding meta-content includes
- indices such as Yahoo!, gopher and ftp directory structures, email headers,
- data dictionaries, etc.
-
- ProjectX (aka HotSauce) is just one of the applications that is enabled by
- the MCF. It should be possible for many different applications to use the
- meta-content represented in the MCF.
-
- Foundations of the MCF
-
- The MCF has its origins in knowledge representation languages such as CycL,
- KRL and KIF. The version of MCF described in this document does not have the
- expressiveness of these languages, but hopefully, some future version will
- include the best of these languages. The expressiveness has intensionally
- been limited in version 1.0 of the MCF primarily for ease of use and for
- reasons related to computational complexity.
-
- MCF is not intended to be an extension of markup languages such as HTML or
- SGML. While it is possible and sometimes useful to embed meta-content within
- HTML files, we believe that for many purposes, it would be better to extract
- out and independently represent this meta-content. MCF is intended to be a
- format for this representation. In fact, we expect a lot of meta-content to
- be embedded in content and extracted automatically by robots that use the
- MCF to represent the results of their activities. In this spirit, MCF should
- be able to represent the meta-content that proposals such as the Dublin Core
- aim to cover.
-
- Though this draft does not address the issues of queries, updates and
- transactions, it is our goal to cover these issues in the near future.
- Project X currently incorporates some of these already, but it would be
- useful to make these part of the MCF as opposed to making them parts of
- applications based on the MCF.
-
- Overview of MCF
-
- MCF files contain descriptions of meta-content objects (MCO --- also
- sometimes referred to as "units".) A meta content object consists of the
- following.
-
- * a unit identifier.
- * some number of slots, each with one or more values
- o depending on the slot, there may be exactly one or more than one
- value
- o the value(s) may be strings, numbers, etc. or they may be pointers
- to other objects. Pointers to other objects are represented using
- unit identifiers.
- o slot values are always sets. i.e., there is no significance to the
- order of values and and number of times a value occurs. The
- combination of the unit, slot name and a slot value can be
- abstracted as a tuple in database terms or as a ground atomic
- formula in logic terms.
- o there is no minimal set of slots that an object should have,
- though specific applications may require certain slots to be
- present for certain kinds of objects.
-
- MCF is an interchange format and does not make any assumptions about how
- information in this format is used by applications. It is our goal however
- aid the following :
-
- * there may be many sources (servers) of meta-content about a particular
- piece of content. It should be possible for a client to mechanically
- integrate these different pieces of meta-content. E.g., there may be
- multiple agencies publishing "subjective ratings" of a certain web
- pages. A client (such as Project X) should be able to obtain these
- ratings from the different agencies and associate them all correctly
- with the same page.
- * it should be possible to separate out the meta-content from the content
- itself. A user may have meta-content about hundreds of thousands of
- files, web pages, etc. on his/her desktop and use this to decide what
- content they actually want to access. It will rarely be possible for
- the user to actually store that much content on his/her desktop.
- * it should be possible for the user to edit meta-content (for personal
- use) obtained from other sources. An example of this editing is the
- reorganization of hierarchies obtained from a source such as Yahoo!
- * it should be possible to get incremental updates. It should also be
- possible for these incremental updates to be reconciled with the edits
- made by the user in a consistent fashion.
- * different applications may recognize different sets of slots. For
- example, the current version of Project X is primarily meant for
- navigation around directed graphs where the nodes correspond to either
- topics or pages (addressable via urls). (Future versions will include
- support for other slots dealing with subjective ratings of content,
- mirror sites, etc.)
-
- An Example
-
- Given below are a couple of sample MCF files with comments. The nested list
- structure outlines the hierarchy. The hierarchy given below is complete only
- with respect to the children of the category Dogs. There are potentially
- other children of the other categories.
-
- * Animals
- o Dogs
- + Famous Dogs Page
- + Best Pets Page
- + Wild Dogs
- o Cats
- + Cat Lovers Page
- + Cat Haters Page
- o Pets
- + General Pet Stuff
- + Best Pets Page
- + Cat Lovers Page
- + Pet Finders Inc.
-
- Note that two of the items under "General Pet Stuff" also occur under other
- categories. We will divide this hierarchy up into two MCF files: Animals.mcf
- and GeneralPetStuff.mcf. You should view these files using a text editor
- such as SimpleText. Animals.html and GeneralPetStuff.html are html formatted
- versions of these files.
-
- You should be able to drag this url --- Animals --- onto your Project X
- space and see portions of the above hierarchy. If you then select the node
- "General Pet Stuff" and invoke the Update Node menu item, you will get the
- complete subhierarchy under that node.
-
- MCF Syntax
-
- An MCF file contains a set of headers followed by a list of mcf object
- descriptions. Each object description starts on a new line with the token
- "unit:". An object description ends either when a new object description is
- encountered or when the end of the file is reached. The end of the file may
- be the end of the physical file or the end of the logical file. The logical
- end of the file is specified by the token end-file: appearing on a new line.
- Urls for MCF files should have the suffix "mcf".
-
- An mcf object description has the following syntax.
- unit: < unit identifier >
- < slot-name > < value 1 > < value 2 >...
- < slot-name > < value 1 > < value 2 >...
- .
- .
- .
-
- Lines starting with the character ';' are comment lines.
-
- Unit Identifiers
-
- Unit identifiers are strings. Identifiers may be relative or absolute.
- Absolute identifiers are typically Universal Resource Locators (urls).
- Relative identifiers for different objects in the same MCF file should be
- distinct. Absolute identifiers can be used to reference objects across MCF
- files.
-
- Relative identifiers can only be used to reference objects within a file.
- Relative identifiers should start with the character '/' and should have the
- suffix ".mco".
-
- * Project X Specifics:
-
- Project X currently deals with 2 kinds of objects --- topics (or
- categories) and content objects.
-
- The identifiers for content objects such as web pages, ftp sites, email
- addresses, etc., which are typically addressable via urls, are assumed
- to be their url.
-
- The identifier for categories/topics which have a web-accesible file
- which provides the descriptions of its children should be the url of
- that file. The url should have the suffix ".mcf". If there does not
- exist such a file, then the identifier should be a relative url. The
- earlier example illustrates this.
-
- Slots
-
- Slot names are restricted to non-white space characters and end with the
- character ':'. The syntax of the values depend on the slot. A list of slot
- values is semantically equivalent to a set. So, the order of values and the
- number of times a value occurs does not carry any significance.
-
- * Project X Specifics:
-
- The following slots are currently used by Project X:
-
- o name: the name of the object. A string.
- o genls: identifiers of the parents of the object. Note that an
- object may have multiple parents. Also note that cycles are
- allowed.
- o spec: identifiers of the children of the object. Note that every
- spec: entry implies a genls: entry and vice versa. So, it is
- neccesary to use only one, typically, genls:.
- o default_genl_x: and default_genl_y: an integer that provides
- ProjectX a hint about where the node should be placed relative to
- its parent.
- o genls_pos: [ parent_identifier x_coordinate y_coordinate].
- genls_pos: implies a genls: relationship. The coordinates are
- relative to the parent. Please note that coordinates should be
- treated as relative measures and not as absolute locations. As the
- number of children increases/decreases, the space
- contracts/expands.
-
- If an object in a certain mcf file does not explicitly specify a
- parent, the parent will default to the object whose identifier is the
- url of that mcf file.
-
- Typically, an mcf file defines a sub-hierarchy. The file itself
- corresponds to a topic node. The file may define one or more layers of
- the hierarchy under it. The immediate children of the file's topic node
- should either not specify any genls: slot or provide the the url of the
- file as the value for the genls: slot. The first approach is better
- because it allows for the file to be moved around more easily.
-
- It is upto the user (i.e., the end user who is viewing the
- sub-hierarchy) to specify where in the user's global hierarchy this
- subhierarchy fits. The user typically does this by dragging the url of
- the mcf onto the region of space where they want the subhierarchy to be
- placed.
-
- An MCF file need not provide a complete description of any object or a
- complete list of node's children. If it does provide a complete list of
- the children of an object, or more generally, all the values of a
- certain slot for an object, this should be specified in the headers.
-
- Headers
-
- Headers are similar to meta-content object descriptions in that they are a
- sequence of slots and values. Headers really provide meta-meta-content. The
- header slots currently used are,
-
- * MCFVersion: a decimal number.
- * FileCompleteAbout: a sequence of values of the form [identifier slot].
- This is used to specify that the file contains all the values of slot
- for the object denoted whose identifier is identifier. The earlier
- example illustrates this.
- * name: a string providing the name of the object corresponding to this
- file.
-
- The headers begin with the token begin-headers: and end with the token
- end-headers:. If the token unit: is encountered before the token
- end-headers: is encountered, an end-headers: token is assumed. Any
- characters appearing before a begin-headers: token or unit: token are
- ignored.
-
- Additional Slots
-
- The following slots are under consideration (for addition to Project X). We
- invite feedback on this list.
-
- * description: an ascii string that describes the object.
- * instanceOf: identifiers for the mco categories (such as Topic, Desktop
- File, Web Page, Person, Corporation) that this object is an instance
- of. Project X currently only recognizes a few kinds of objects (Topic,
- Desktop object, Gopher object and WWW page) and this is implicitly
- derived from the identifier of that object.
- * mirrors: urls of mirrors of this object.
- * author_organization_name: string(s) giving the name(s) of the
- organization(s) (such as "Apple Computer") to whom the content object
- belongs. (The underscore notation is an experiment in allowing
- modifiers and qualifiers that can be added to a slot to to change its
- meaning in a precise way, in a fashion similar to the Dublin Core.)
- * author_organization_mco: identifier(s) for the mco object(s)
- corresponding to the organization(s) to whom the content object
- belongs.
- * author_individual_name: string(s) giving the name(s) of the
- individuals(s) who is(are) the author(s) of this content object.
- * author_individual_mco: identifier(s) for the mco object(s)
- corresponding to the individual(s) who is(are) the author(s) of this
- content object.
- * mediaTypes: The media types present in this content object. QuickTime,
- JPEG, etc. are media types and are denoted by strings such as
- "QuickTime", "JPEG", etc. This is also intended to cover the data
- representation of the object, such as Postscript file or Java Applet.
- * rating: subjective ratings of the content object. Ratings will
- following the general scheme outlined in the PICS Rating Systems and
- Services. The syntax of the value is yet to be determined.
- * firstPublicationDate: a single string giving the first publication date
- of the object.
- * lastRevisionDate: a single string giving the last revision date of the
- object.
- * size: an integer giving the approximate size of the object in bytes.
- * availabilityStatus: a string describing the availability of the object.
- The syntax of the value is yet to be determined.
- * linksTo: identifiers for objects that this object contains hyperlinks
- to.
- * language: Strings specifying the languages (English, French, etc.) of
- the content.
- * topic: This is a refinement of genls: and will have the same
- functionality.
- * mentions: identifiers for the spatio-temporal objects (such as Boston
- or George Washington) mentioned in the content.
-
- Once these slots are available, Project X will be able to use them to
- provide more intelligent access to content.
-
- Acknowledgements
-
- I would like to thank Jed Harris for extensive feedback on the structure and
- content of this document. These people made Project X possible.
-
- Appendix A: BNF for MCF files
-
- < mcf file > -> < headers > < mco list > end-file:
- < headers > -> begin-headers: < linebreak > < slots > end-headers: <
- linebreak >
- < mco list > -> < mco > < mco list > | < mco >
- < mco > -> unit: < unit identifier > < linebreak > < slots >
-
- < slots > -> < slot > < slots > | < slot >
- < slot > -> < slot name > < slot values > < linebreak >
- < slot values> -> < white space > < slot value > | < slot values >
-
- < slot name > -> < token >:
- < slot value > -> < token > | < string > | < slot specific value >
-
- < unit identifier > -> < absolute identifier > | < relative identifier >
- < absolute identifier > -> < string >
- < relative identifier > -> string beginning with the character '/' with the
- suffix .mco.
-
- < token > -> character sequence with no white space or linebreaks
- < linebreak > -> any sequence of standard linebreak characters (including
- '\r' and '\n')
- < white space > -> any sequence of standard white space characters
- (including '\t' and ' ')
- < string > -> character sequence starting and ending with '"'
-